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Abstract 

The data-complexity of both satisfiabihty and finite satisfiability for the 
two- variable fragment with counting is NP-complete; the data-complexity 
of both query- answering and finite query- answering for the two-variable 
guarded fragment with counting is co-NP-complete. 
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1 Introduction 

Let if he a, sentence (i.e. a formula with no free variables) in some logical frag- 
ment, ip{y) a formula with free variables y, A a set of ground, function-free 
literals, and a a tuple of individual constants with the same arity as y. We are 
to think of A as being a body of data, ip a background theory, and V'(a) a- query 
which we wish to answer. That answer should be positive just in case A U {ip} 
entails ^^{a). What is the computational complexity of our task? 

A fair reply depends on what, precisely, we take the inputs to our problem 
to be. For, in practice, the background theory ip is static, and the query ^^{y) 
small: only the database A, which is devoid of logical complexity, is large and 
indefinitely extensible. Accordingly, we define the query- answering problem with 
respect to ip and V'(y) as follows: given a set A of ground, function- free literals 
and a tuple a of individual constants with the same arity as y, determine whether 
A U {v?} entails ^{a). Similarly, we define the finite query answering problem 
with respect to ip and as follows: given A and a, determine whether Au{i^} 
entails '>p{a) under the additional assumption that the domain of quantification 
is finite. The computational complexity of (finite) query-answering problems 
is typically lower than that of the corresponding entailment problem in which 
all the components are treated, on a par, as input. From a theoretical point of 
view, it is natural to consider the special case where il^{y) is the falsum. Taking 
complements, we define the satisfiability problem with respect to ip as follows: 
given a set A of ground, function-free literals, determine whether A U {p} is 
satisfiable. Likewise, we define the finite satisfiability problem with respect to p 
is as follows: given A, determine whether A U {p} is finitely satisfiable. 



The complexity of these problems depends, of course, on the logical frag- 
ments to which Lp and V'(y) ^tre assumed to belong. It is common practice to take 
to be a positive conjunctive query — that is, a formula of the form 3x'k(x, y), 
where 7r(a;,y) is a conjunction of atoms featuring no function-symbols. This re- 
striction is motivated by the prevalence of database query-languages, such as, 
for example, SQL, in which the simplest and most natural queries have pre- 
cisely this form. By contrast, the choice of logical fragment for ip is much less 
constrained: in principle, it makes sense to consider almost any set of formulas 
for this purpose. Once we have identified a logic C from which to choose we 
can obtain bounds on the complexity of the (finite) satisfiability problem and 
the (finite) query answering problem with respect to any sentence in £ and 
any positive conjunctive query ip{y). These complexity bounds are collectively 
referred to as data complexity bounds for C. 

In this paper, we analyse the data complexity of two expressive fragments 
of first-order logic for which the complexity of satisfiability and finite satisfia- 
bility has recently been determined: the two-variable fragment with counting 
quantifiers, denoted C^, and the two- variable guarded fragment with counting 
quantifiers, denoted QC^. We show that the satisfiability and finite satisfia- 
bilty problems with respect to any C^-formula are in NP, and that the query- 
answering and finite query-answering problems with respect to any ^C^-formula 
and any positive conjunctive query are in co-NP. We show that these bounds 
are the best possible, and that the query-answering and finite query-answering 
problems with respect to a C^-formula and a positive conjunctive query are in 
general undecidable. The data complexity of various logical fragments with 
counting quantifiers has been investigated in the literature (see, for example, 
Hustadt et al. |6j, Ghmm et al. [4], Ortiz ei al. and Artale et al. [I]). How- 
ever, this is the first time that such results have been established for the large 
(and mathematically natural) fragments and QC^ . In addition, the proofs 
in this paper are based ultimately on the technique of reduction to Presburger 
arithmetic, which is novel in this context. 

2 Preliminaries 

We employ the standard apparatus of first-order logic (assumed to contain the 
equality predicate «) augmented with the counting quantifiers, 3^c, ~^'^c and 
^=c (for C ^ 0), which we interpret in the obvious way. The predicate calculus 
with counting, denoted C, is the the set of first-order formulas with counting 
quantifiers, over a purely relational signature. The two-variable fragment with 
counting, denoted C^, is the fragment of C involving only the variables x and 
y, and only unary or binary predicates. If r is any binary predicate (including 
«), we call an atomic formula having either of the forms r(x,y) or r{y,x) a 
guard. Note that guards, by definition, contain two distinct variables. The two 
variable guarded fragment with counting, denoted GC^, is the smallest set of 
formulas satisfying the following conditions: 

1. QC^ contains all atomic formulas, and is closed under Boolean combina- 
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tions; 

2. if is a formula of QC^ with at most one free variable, and u is a variable 
(i.e. either x or y), then the formulas Vuip and 3u(p are in QC^; 

3. if </? is a formula of QC^ , 7 a guard, u a variable, and Q any of the quantifiers 
3, 3t^c, 3j,c 3^(7 (for C > 0), then the formulas Vm(7 (p), Qu["f A 
and (5m7 are in QC^ . 

For example, 

3^ia;(professor(a;) A 3^4y(supervises(x, y) A grad_student(y))) (1) 

is a C^-sentence, with the informal reading: At most one professor supervises more 
than three graduate students. Likewise, 

^3a;(professor(a;) A 3^4iy(supervises(a::, ?/) A grad_student(y))) 

is a fJC^-sentence, with the informal reading: No professor supervises more than 
forty graduate students. However, ([T|) is not in the fragment QC^ , because the 
quantifier 3^i does not occur in a guarded pattern. It will be convenient in 
the sequel to consider the following smaller fragments. We take to be the 
fragment of in which no counting quantifiers and no instances of w occur; 
likewise, we take Q^^ to be the fragment of QC^ in which no counting quantifiers 
and no instances of w occur. Evidently, Q'^^ C 

Both and QC^ lack the finite model property. The satisfiability and finite 
satisfiability problems for are both NEXPTIME-complete (Pratt-Hartmann 
pT| : see also Pacholski et al. fl0|); the satisfiability and finite satisfiability prob- 
lems for gC^ are both EXPTIME-complete (Kazakov [Tj, Pratt-Hartmann P^). 
In the context of and QC^ , predicates of arities other than 1 or 2 lead to no 
interesting increase in expressive power. Adding individual constants to 
likewise leads to no interesting increase in expressive power, and no increase 
in complexity, since occurrences of any constant c can be simulated with a 
unary predicate Pc in the presence of the C^-formula 3^ixpc{x). On the other 
hand, adding even a single individual constant to QC^ results in a fragment with 
NEXPTIME-complete satisfiability and finite-satisfiability problems. Thus, it 
is most convenient to assume these fragments to be constant-free; and that is 
what we shall do in the sequel. 

A positive conjunctive query (or, simply: query) is a formula V'(27) of the form 
3a; {ai(x,y) A • • • A an{x,y)), where ri ^ 1 and, for all i (1 ^ i ^ n), ai(a;, y) 
is an atomic formula whose predicate is not and whose arguments are all 
variables occurring in x, y. Since we shall be interested in answering queries in 
the presence of C^- or ^C^-formulas, there is little to be gained from allowing 
to contain predicates of arity greater than 2; in the sequel, therefore, we 
assume that all predicates in positive conjunctive queries are unary or binary. 
An instance of V'(27) is simply the corresponding formula ip{a), where a is a tuple 
of constants. We allow the tuples x and y to be empty. Allowing individual 
constants to appear in positive conjunctive queries does not essentially change 
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the problem; in the sequel, therefore, we assume positive conjunctive queries to 

be constant-free. 

Definition 1. ff is a sentence (in any logic), define S^p to be the following 
problem: 

Given a finite set of ground, function-free literals A, is A U {i^} 
satisfiable? 

Likewise, define J^S^ to be the following problem: 

Given a finite set of ground, function-free literals A, is A U 
finitely satisfiable? 

We call the satisfiability problem with respect to y, and J^S^p the finite sat- 
isfiability problem with respect to ip. 

Definition 2. ff </? is a sentence and ip{y) a formula (in any logic) having no 
free variables apart from y, define Qi^,0(g) to be the following problem: 

Given a finite set of ground, function-free literals A and a tuple of 
constants a of the same arity as y, does A U {ip} entail V'('j)? 

Likewise, define J-Q^^-^(y) to be the following problem: 

Given a finite set of ground, function-free literals A and a tuple of 
constants a of the same arity as y, is V'(a) true in every finite model 
of A U {(^1? 

We call Q^^xp(y) the query answering problem with respect to and i^{y), and 
^Qip,il>{y) finite query answering problem with respect to ip and ip{y). 

Answering queries is at least as hard as deciding wrzsatisfiability: if p is any 
predicate not occurring in A or (p, then then A U {p} \= 3xp{x) if and only if 
A U {(p} is unsatisfiablc. Similarly for the finite case. 

We establish the following complexity results. For any C^-sentence ip, both 
and are in NP. These bounds are tight in the sense that there exists 
a C^-sentence — in fact, a tj^~-sentence — ip such that the problems S,p and TS^ 
coincide, and are are NP-hard. The query- answering problem for is of little 
interest from a complexity-theoretic point of view: there exist a C^-sentence (p 
and a positive conjunctive query tpiy) such that Q^..,p(y) is undecidable; simi- 
larly for J'Qip.^(y)- However, by restricting attention to. QC^ , we restore upper 
complexity bounds comparable to those for Sip and J-S^p: for any 5C^-sentence 
(p and any positive conjunctive query tpiy), both Qip^^^^y) and J^Q,p^',p{y) are in 
co-NP. Again, the fact that there exists a ^7^~-sentcncc p for which S,^(= TS^p) 
is NP-hard means that these bounds are tight. The above results may be in- 
formally expressed by saying: "The data-complexity of (finite) satisfiability for 
is NP-complete; the data-complexity of (finite) query- answering for QC^ is 
co-NP-complete." These data-complexity bounds contrast with the complexity 
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bounds for satisfiability and finite satisfiability in the fragments and QC^ 
mentioned above. 

In the sequel, if is a formula, \\ip\\ denotes the size of ip, measured in the 
obvious way; similarly, if is a set of formulas, ||iy9|| denotes the total size of ip. 
If X is any set, \X\ denotes the cardinality of X. 

3 The fragment 

In this section, we review some facts about the fragment C^, closely following the 
analysis in Pratt-Hartmann [llj . We have simplified the original terminology 
where, for the purposes of the present paper, certain complications regarding 
the sizes of data-structures can be disregarded; and we have lightly reformulated 
some of the lemmas accordingly. 

Let E be a signature of unary and binary predicates. A 1-type over S is a 
maximal consistent set of equality-free literals involving only the variable x. A 
2-type over S is a maximal consistent set of equality- free literals involving only 
the variables x and y. If 21 is any structure interpreting E, and a € A, then 
there exists a unique 1-type 7r(x) over E such that 21 \= 7r[a]; we denote tt by 
tp^[a]. If, in addition, b G Ais distinct from a, then there exists a unique 2-type 
t{x, y) over E such that 2t |= T[a, 5]; we denote r by tp'*[a, h]. We do not define 
tp^[a, 6] if a = h. If vr is a 1-type, we say that tt is realized in 21 if there exists 
a & A with tp'^[a] = tt. If r is a 2-type, we say that r is realized in 2t if there 
exist distinct a, 6 £ A with tp^[a, b] — r. 

Notation 1. Let t be a 2-type over a purely relational signature E. The result 
of transposing the variables x and y in r is also a 2-type, denoted r~^; the set of 
literals in r not featuring the variable y is a 1-type, denoted tp2(T); likewise, the 
set of literals in r not featuring the variable x is also a 1-type, denoted tp2 (t) . 

Remark 1. If t is any 2-type over a purely relational signature E, then tp2(T) = 
tp]^(r^^). If ^ is a structure interpreting E, and a, b are distinct elements of 
A such that tp^[a, 6] = r, then tp^[6,a] — t^^ , tp'*[a] = tp]^(r) and tp®[&] = 

Lemma 1. Let ip be a -formula. There exist (i) a -formula a containing 
no quantifiers and no occurrences of k,, {ii) a list of positive integers Ci, . . . , Cm 
and (Hi) a list of binary predicates /i, . . . , /,„, with the following property. If (p* 
is the C'^ -formula 

yxyyiaVx^y)A f\ Vx3=c,y(A(x, y) A a; 9^ y), (2) 

and C — max/jC/i, then (i) ip* |= ip, and (ii) any model of ip over a domain 
having at least C + 1 elements may be expanded to a model of ip* . 

Proof. Routine adaptation of standard techniques. See, e.g. Borger et al. [2], 
p. 378. □ 
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If A is a set of ground, function-free literals, and and kp* are as in Lemmall] 
then A U {lp\ evidently has a (finite) model if and only if either A U {<y9} has a 
model of size C or less, or A U {<^*} has a (finite) model. 

Lemma [T] assures us that formulas of the form ^ are as general as we need. 
So, for the remainder of this section, let us fix a formula given by The 
predicates /i, . . . , /,„ will play a special role in the ensuing analysis. We refer to 
them as the counting 'predicates. However, we stress that no special assumptions 
are made about them: in particular, they can occur in arbitrary configurations 
in the sub- formula a. 

Fix the constant Z — (toC-|-1)^. Let S* be the signature of together with 
2[logZ] -|- 1 new unary predicates (i.e. not occurring in </?*). Henceforth, S* 
will be implicit: thus, unless otherwise indicated, structure means "structure 
interpreting S*"; 1-type means "1-type over E*"; 2-type means "2-type over 
E*"; and so on. 

Definition 3. Let r be a 2-type. We say that r is a message-type if fh{x, y) € t 
for some h {1 ^ h ^ m). If r is a message- type such that r^^ is also a message- 
type, we say that r is invertible. On the other hand, if r is a 2-type such that 
neither r nor is a message- type, r is a silent 2-type. If r is a 2-type such 
that neither q(x, y) nor q{y, x) is in t for any binary predicate g, r is vacuous. 

The terminology is meant to suggest the following imagery. Let 21 be a 
structure. If tp®[a, h] is a message- type ^, then we may imagine that a sends a 
message (of type /i) to h. If /i is invertible, then h replies by sending a message 
(of type /x~^) back to a. If tp'^[a, 6] is silent, then neither element sends a 
message to the other. Note that every vacuous 2-type is by definition silent; but 
the converse is not generally true. 

For convenience, we decide upon some enumeration 

TTl , . . . , TTl 

of the set of all 1-types, and some enumeration 

Ml , • ■ ■ , Mm* , Mm* +1 , • • ■ , Ma/ 

of the set of all message- types, such that /ij is invertible if 1 ^ j ^ M*, and non- 
invertible if M* + 1 ^ j ^ M . (That is: the invertible message-types are listed 
first.) In addition, let S denote the set of silent 2-types. The above notation, 
which will be used throughout this section, is summarized in Table [TJ 

We now introduce two notions necessary to state the key lemmas of this 
section regarding the satisfiability of C^-formulas. 

Definition 4. A structure 21 is chromatic if distinct elements connected by a 
chain of 1 or 2 invertible message-types have distinct 1-types. That is, 21 is 
chromatic just in case, for all a, a', a" G A: 

1. if a 7^ a' and tp^[a, a'] is an invertible message- type, then tp®[a] ^ tp®[a']; 
and 
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Symbol 


Definition 


Z 

s* 

TTl , . . . , TTl 

Ml) ■ • ■ )Mm* 
f^w+i, ■ ■ ■ , Mm 


(mC + 1)2 

signature of tp togetlier witfi 2 [log Z~\ + 1 new unary predicates 

an enumeration of the 1-types over S* 

an enumeration of the invertible message-types over E* 

an enumeration of the non-invertible message-types over E* 

set of silent 2-types over E* 



Table 1: Quick reference guide to symbols defined with respect to Formula 

2. if a, a', a" are pairwise distinct and both tp'^[a,a'] and tp®[a',a"] are in- 
vertible message- types, then tp®[a] ^ tp^[a"]. 

Remark 2. A structure is chromatic if and only if (i) no object sends an 
invertible message to any object having the same 1-type as itself; and (ii) no 
object sends invertible messages to any two objects having the same 1-type as 
each other. 

Definition 5. A structure 21 is differentiated if, for every 1-type tt, the number 
u of elements in A having 1-type tt satisfies either u ^ 1 or u > Z . 

By the Lowenheim-Skolem Theorem, we may confine attention in the sequel 
to finite or countably infinite structures. The following (routine) lemma ensures 
that we may further confine attention to chromatic, differentiated structures of 
these cardinalities. 

Lemma 2. Suppose ^ ip* . Then, by re-interpreting 2[log Z^ ofthe2\logZ~\-\- 
1 unary predicates of E* not occurring in ip* if necessary, we can obtain a 
chromatic, differentiated structure 21' over the same domain, such that 21' \== f* ■ 

Proof. Pratt-Hartmann [11], Lemmas 2 and 3. □ 

In the sequel, we shall need to record the cardinalities of various finite or 
countably infinite sets. To this end, we let N* = N U {Hq}, and we extend the 
ordering > and the arithmetic operations 4- and • from N to N* in the obvious 
way. Specifically, we define Ho > n for all n G N; we define Hq + Ho — Ho-Hq — 
and • Ko = No • = 0; we define n -I- Hq = Nq -I- n = Nq for all n G N; and we 
define n • Kq = Hg • ri = for all n G N such that n > 0. Under this extension, 
> remains a total order, and -f , ■ remain associative and commutative. 

Our next task is to develop the means to talk about 'local configurations' in 
structures. 
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Definition 6. A star-type is a pair a = (tt, w), wliere tt is a 1-type, and v = 
(ui, . . . ,vm) is an Af -tuple over N* satisfying the condition that, for all j (1 ^ 

Vj > implies tpilpij) — tt. 

In this context, we denote tt by tp{(j) and vj by cr[j]. If 21 is a finite or countably 
infinite structure, and aGA,we denote by st^[a] the star-type (tt, {vi, . . . , vm)), 
where tt — tp'^[a] and 

Vj = \{b e A\{a} : tp^[a,6] = fij}\ 

for all j (1 ^ j ^ M). We call st^[a] the star-type of a in 21; and we say that a 
star-type a is realized in 21 if a = st'^[a] for some a ^ A. 

We may think of st^[a] as a description of the 'local environment' of a in 
21: it records, in addition to the 1-type of a in 21, the number of other elements 
to which a sends a message of type fj,j , for each message- type fij . Properties of 
star-types realized in models capture 'local' information about those models. 

Definition 7. Let cr = (tt, {vi, . . . ,wm)) be a star-type. We say that cr is D- 
bounded, for D a positive integer, if a[j] < D for all j (1 < j < M). We say 
that cr is chromatic if, for every 1-type tt', the sum 

c = ^^{vj I 1 ^ J =^ M* and tp2(/ij) = tt'} 

satisfies c ^ 1, and satisfies c — ii tt' — tt. We say that a finite or countably 
infinite structure 21 is D- bounded if every star- type realized in 21 is Z?-bounded. 

Obviously, if 21 |= (/?*, then 21 is C-bounded. Importantly, information about 
the populations of star-types realized in models can tell us all that we need to 
know about those models, from the point of view of the fragment C^. 

Definition 8. Let 21 be a finite or countably infinite structure, and let a = 
(Ti, . . . , fJAT be a list of star-types. For all fc (1 ^ fc < iV), let Wk e N* be given 

by 

Wk = \{aeA \ st'^[a] = crfc}|- 
The a-histogram of 2t, denoted Hg{%), is the A^-tuple (wi, . . . ,wn)- 

We may thus think of Hg^{^) as a 'statistical profile' of 2t. For the next 
definitions, recall (Table [1]) that tti, . . . ,77^, is an enumeration of the 1-types, 
and that S is the set of silent 2-types. 

Definition 9. If 2t is a structure and tt, tt' are 1-types (not necessarily distinct), 
we say that tt and tt' form a quiet pair in 21 if there exist distinct elements a 
and a' of A, such that tp[a] — tt, tp[a'] — tt' and tp[a, a'] is silent. 

Definition 10. Let I be the set of unordered pairs of (not necessarily distinct) 
integers between 1 and L: that is, T — {{i, i'} \ I ^ i ^ i' ^ L}. A frame is a 
triple = ((T, /, 6), satisfying: 
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1. a — (cti, . . . jCTat) is an A'^-tuple of pairwise distinct star- types for some 
iV > 0; 

2. I CI; and 

3. 6 : I ^ E is a. function such that, for all {i,i'} G / with i ^ i', 
tp^ie{{i,i'})) ^ir, and tp2{e{{i,i'})) ^ n,, . 

The frame J- is D-hounded if every star- type in a is D-bounded. Likewise, T is 
chromatic if every star-type in a is chromatic. 

Think of a frame J-" = ((t, /, 0) as a (putative) schematic description of a 
structure, where a tells us which star-types are realized, / tells us which pairs 
of 1-types are quiet, and 9 selects, for each quiet pair of 1-types, a silent 2-type 
joining them. More precisely: 

Definition 11. Let 21 be a structure and T — (ct, /, 9) a frame. We say that J- 
describes 21 if the following conditions hold: 

1. CT is a list of all and only those star- types realized in 21; 

2. if TTi and tt^/ form a quiet pair in 2t, then {i,i'} £ /; 

3. if TT; and tt;' form a quiet pair in 21, then there exist distinct a, a' £ A such 
that tp^[a,a'] = 9{{i,i'}). 

Frames contain the essential information required to determine whether cer- 
tain structures they describe are models of ip* . The next definition employs the 
notation established in Table [1] and Definition [SI 

Definition 12. We write T \= (p* ii the following conditions are satisfied: 

1. for aU A: (1 A; ^ N) and all j (1 ^ j < M), if ak[j] > then 
h AMj a{x,y) Aa{y,x); 

2. for aU {i, i'} e /, |= A i'}) aix, y) A x); 

3. for all fc (1 ^ fc ^ N) and all /i (1 ^ ft, ^ m), the sum of all the iJk\j] 
(1 ^ j ^ M) such that fh[x, y) E fij equals Ch- 

The next lemma helps to motivate this definition. 
Lemma 3. If ^ \= tp* , then there exists a frame T describing 21, such that 

The proof is almost immediate: Conditions 1 and 2 in Definition [T^] are 
secured by the fact that 2t \= Va;V?/(a y x ~ y), while Condition 3 is secured by 
the fact that 21 |= Ai^hsim.'^^^=Chy{fh{x,y) f\x ^ y). The following Lemma 
also follows almost immediately from the above definitions. 
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Lemma 4. Let ^ be a structure, T a frame describing 2t, and D a positive 
integer. Then: 

1. T is D-bounded if and only if is D-bounded; 

2. T is chromatic if and only if 21 is chromatic; 

However, while every structure is described by some frame, not every frame 
describes a structure; and it is important for us to define a class of frames 
which do. To this end, we associate with a frame J- a collection of numerical 
parameters, as follows. 

Notation 2. Let T = {a, I, 9) be a frame, where a — (fii, . . . , ctat), for some 
N > 0, and recall the notation established in Table [1] and Definition [6l If T is 
clear from context, for integers i, k in the ranges l^z^L, l^fc^A^ write: 

j 1 if tp((Tfc) = TTi 

1 otherwise; 

{1 if, for all i (1 ^ j ^ M), tp2(/ij) = TTi imphes (7k[j] — 
otherwise; 

^(Jk[j], where J = {j | M* + 1 < j M and tp2il^j) = ""J; 
^^(^^[j], where J = {j | 1 ^ j ^ A/ and tpj = tt^}. 

In addition, for integers i,j in the ranges l^z^L, l^j^ M*, write: 

qjk = crfc[j]. 



Oik = 

Pik = 
nk = 

Sik = 



With this notation in hand we can characterize a class of frames whose 
members are guaranteed to describe structures. 

Definition 13. Let T = {a, 1,9) be a frame, where a = (cti, . . . , crjv). Let 
w = {wi, . . . , Wat) be an A^-tuple over N*. Using Notation[21 for al\i {1 ^ i ^ L), 
all i' (1 < i' < L) and aU j (1 ^ j < M*), let: 

Ml = ^ OtkWk Vj = ^ gjfeWfe Xii' = ^ OikPi'kWk- 

IsJfcsJAT l^fe^Af 

We say that an iV-tuple w over N* is a solution of if the following conditions 
are satisfied for all? (1 ^ i ^ L), all i' (1 ^ i' ^ L), all j (1 ^ j ^ Af*) and all 
fc (1 ^ fc < TV): 

(CI) Vj — Vj' , where j' is such that fij^ = fij'; 

(C2) S^k ^ Ui] 
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(C3) Ui ^ 1 or > Z; 

(C4) if Oik = 1; then either > 1 or r^/fe ^ Xi>i; 

(C5) if {i, i'} /, then either Ui ^ 1 or uii ^ 1; 

(C6) if {i, i'} ^ I and Oik = 1, then r^/fc ^ Xi/^. 

The conditions C1-C6 in Definition [T2] may be written as a quantifier- 
free formula in the language of Presburger arithmetic — in other words, as a 
Boolean combination of linear inequalities with integer coefficients and vari- 
ables wi, . . . , wn- By treating a negated inequality as a reversed inequality in 
the obvious way, we may assume that the Boolean combination in question is 
positive — i.e. involves only conjunction and disjunction. Denote this positive 
Boolean combination of inequalities by £. By definition, J- has a solution if and 
only if £ is satisfied over N*; and J- has a finite solution (i.e. a solution in which 
all values are finite) if and only if £ is satisfied over N. 

We are at last in a position to state the key lemmas of this section. 

Lemma 5. // 21 is a dijferentiated structure and T = {a, /, 6) is a frame de- 
scribing 21, then i7g.(2l) is a solution of 

Proof. Pratt-Hartmann [11] . Lemma 13, Lemma 16. □ 

Lemma 6. If T is a chromatic frame such that J- \= ip* , and w is a solution 
of J-, then there exists a structure 21 such that: (i) 21 |= ip* ; (ii) T describes 21; 
and (iii) w — H^{^). 

Proof. Pratt-Hartmann |11| . Lemma 14, Lemma 17. □ 

Lemmas [S] and [5] in effect state that, to determine the satisfiability of (p*, 
it suffices to guess a C-bounded, differentiated, chromatic frame J^, and to test 
that J- has a solution and that !F \= (p* . Furthermore, by testing instead whether 
has a finite solution, we can determine the finite satisfiability of ip* . The proof 
of Lemma [S] is relatively straightforward; that of Lemma [S] is more challenging, 
because it involves constructing a model 21 of (yS* , given only the frame J- and its 
solution. It can in fact be shown that we may without loss of generality confine 
attention to frames whose size (measured in the obvious way) is bounded by a 
singly exponential function of the size of (p* (Pratt-Hartmann [TT], Lemma 10). 
From this it follows that the problems of determining the satisfiability/finite 
satisfiability of a given C^-formula are in NEXPTIME. In the present context 
of investigating the date-complexity of C^, however, this matter may be safely 
ignored. 
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4 Data-complexity of satisfiability and finite sat- 
isfiability for 

In this section, we give bounds on the data-complexity of satisfiability and finite 
satisfiability in C^. 

We consider the upper bounds first. For any C^-formula ip, we describe a pair 
of non-deterministic polynomial-time procedures to determine the satisfiability 
and finite satisfiability of A U {(f}, where A is a given set of ground, non- 
functional literals. The strategy is as follows. Relying on Lemmas [5] and [51 we 
guess a frame J- such that J- \= ip, and assemble the inequalities required for 
T to have a solution. By augmenting these inequalities with extra conditions 
(based on A), we can check for the existence of a (finite) model of ip whose 
histogram (with respect to some sequence of star-types) is such that a model of 
A can be spliced into it, thus yielding a model of A U {p}- 

If A is a set of ground, function-free literals, we denote by const(A) the set 
of individual constants occurring in A. 

Theorem 1. For any -sentence ip, both S,^ and J-S^ are in NP. 

Proof. Let be a C^-formula, and A a set of ground, function-free literals, over 
a signature Sa. Let p>* and C be as in Lemma[T] Determining whether AU{(/3} 
has a model of size C or less is straightforward. For we may list, in constant 
time, all models of p> of size C or less (interpreting the signature of Lp) . Fixing 
any such model 21, we may then guess an expansion 21"*" of 21 interpreting Sa, 
and check that 21"'" \= A. This (non-deterministic) process can be executed 
in time bounded by a linear function of ||A||. Hence, it suffices to determine 
whether A U {p*} has a model. 

From now on, we fix the formula Lp* having the form ([2]), and employ the 
notation of Table [H together with the associated notions of 1-type, message- 
type and star-type over the signature E*. Since E* contains 2[logZ] -I- 1 unary 
predicates not occurring in p, pick one of these extra predicates, o. We call a 
1-type TT observable if o{x) G tt, we call a message- type p observable if tpj^(p) and 
tp2(p) are observable, and we call a star-type a observable if tp(cr) is observable. 
Informally (and somewhat approximately), we read o{x) as "x is an element 
which interprets a constant in A" . 

We now define two non-deterministic procedures operating on ip* and A. 
We show that both procedures run in time bounded by a polynomial function 
of II A II, that the first of these procedures has a successful run if and only if 
A U {ip*} is satisfiable, and that the second has a successful run if and only 
if A U {<y3*} is finitely satisfiable. This proves the theorem. Procedure I is as 
follows. 

1. Guess a structure Ti^ interpreting the signature E* U Ea over a domain 
D with \D\ ^ const(A); and let J) be the reduct of to the signature 
E*. If D+ 1^ A or S ^ Wxyy{a Vx^y), then fail. 

2. Guess a list cri, . . . ,aN' of observable, C-bounded, chromatic star-types. 
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and guess a further list crjv'+i, • ■ • , ctat of non-observable, C-bounded, chro- 
matic star-types. Write 

a = (Ti,...,CJN'-,(^N' + li---iC^N, 

and guess a frame = {a, 1,9) with these star- types, li !F ^ ip* , then 
fail. 

3. Guess a function d : D {cti, . . . , crjv} mapping every element of D to 
one of the observable star- types of T. Writing {n'^, {vf,..., vf^)) for S{d), 
if, for any d d D, either of the conditions 

(a) 7r'* = tp»[d] 

(b) for all j (1 ^ j ^ M) such that pj is an observable message-type, 

= \{d' eD\d' ^d and tp® [d, d'] = 

does not hold, then fail. Otherwise, record the numbers ni,...,n7V', 
where, for all fc (1 ^ /c ^ N'), Uk ~ \5^^{<7k)\, and then forget 6. 

4. Let £ be the (positive) Boolean combination of inequalities required for T 
to have a solution, as explained in Section [3] Guess the truth- values of all 
the inequalities involved in If the guess makes £ false (considered as a 
Boolean combination), fail; otherwise, let £' be the set of these inequalities 
guessed to be true. 

5. Recalling the numbers nj. from Step [3] let 

£'s=£' yj{wk^nk\l^k^ N'). 
If there is no solution of 5^, then fail. 

6. Succeed. 

Procedure II is exactly the same as Procedure I, except in Step [S] Instead of 
failing if there is no solution of £'g , we instead fail if there is no finite solution 
of£^. 

We consider the running time of Procedure I, writing ||A|| = n. Step [T] can be 
executed in time O(n^). Step[2]can be executed in constant time. In executing 
SteplHl we note that, once 5{d) has been guessed and checked, the space required 
to do so can be recovered; only the tallies ni, . . . ,njv' need be kept, and this 
never requires more than A^'logn space. Moreover, in checking 5{d), the only 
difficulty is to compute the quantities \{d' & D \ d' ^ d and tp®[d, d'] = 
for observable message types fij] but this never requires more than logn space. 
Hence Step [31 can be executed in space 0(log(n)), and hence in time bounded 
by a polynomial function of n. Step [Jean be executed in constant time. SteplH] 
involves determining the existence of a solution to the inequalities in £'g. Since 
the size of £' is bounded by a constant, the size of £'g is in fact O(logn); more- 
over, £'g involves a fixed number of variables. After guessing which of these 
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variables take infinite values, this problem can be solved using Lenstra's algo- 
rithm (Lenstra [8]) in time bounded by some fixed polynomial function of logn, 
and hence certainly in time 0{n). Thus, Procedure I can be executed in polyno- 
mial time. Procedure II can also be executed in polynomial time, by an almost 
identical argument. 

We show that Procedure I has a successful run if and only if A U {(f*} is satisfi- 
able, and that Procedure II has a successful run if and only if AU{(/3*} is finitely 
satisfiable. Suppose 21+ is a finite or countably infinite model of A U {<y3*}, in- 
terpreting the signature S* U Ea over a domain A; let 21 be the reduct of 21+ 
to E*; and let D C A be the set of all and only those elements interpret- 
ing the constants const(A) in 2t+. By assumption, S* contains 2[logZ] -|- 1 
unary predicates not occurring in (/?*, one of which is the predicate o. By re- 
interpreting these new predicates if necessary, we may assume that — D, 
and furthermore (by LemmaH]) that 21 is differentiated and chromatic. Let J)+ 
be the restriction of 21+ to D, and S the restriction of 21 to D (so that J) is 
a reduct of S)"*"). With these choices, Step [1] succeeds. By Lemma [21 let T 
be a frame describing 21 such that T ip* . By Lemma SI Parts 1 and 2, 
is C-bounded and chromatic. Without loss of generality, we may assume the 
star-types in ^ to be ct = cti, . . . , crjv, (^n'+i, ■ • ■ , ctn, where ui, . . . , aN' are the 
star-types realized in 2t by elements of D, and aN'+i, ■ ■ ■ , ctn are the star-types 
realized in 21 be elements oi A\D. With these choices. Step [2] succeeds. Define 
S : D ^ {cti, . . . ,(TN'} by setting S{d) = st^[d\. With these choices, Step [3] suc- 
ceeds. Let w — Ha{%), so that, by LemmaO w) is a solution of E. Let E' be the 
set of inequalities mentioned in E which are satisfied by w. With these choices, 
Step m succeeds. The above choice of w ensures that w satisfies E'] to show that 
Step [5] — and hence the whole procedure — succeeds, it suffices to show that, for 
all A; (1 ^ fc ^ N') Wk — rik. Now, since = D, a € A has an observable 
star-type ak if and only ii a E D. But for d E D, we have 6{d) — st^[d], whence 
n'l^ = \S~^{(7k)\ is the number of elements d E D such that st^[(i] = ctj., and 
hence the number of elements a G A such that st^[a] = ak- That is: Wk = rik 
as required. The corresponding argument for Procedure II is almost identical, 
noting that, if 21+ is finite, then w = Hg{'Ql) will consist entirely of finite values. 

Suppose, conversely, that Procedure I has a successful run. Let £>+, £>, S, T , 
and E' be as guessed in this run, and let = wi, . . . , wjv be a solution of f^, 
guaranteed by the fact that Step [5] succeeds. Since Step [T] succeeds, we have 
\= A, and S) ^ Va;Vy(a Va; w y). By assumption, T is chromatic; moreover, 
since Step [5] succeeds, T \= ^p* . Since Step 2] succeeds, w is a solution of the 
Boolean combination of inequalities f , and hence a solution of the frame T . By 
Lemma [6l then, let 2t be a model of p* described by T in which the star-types 
(Ti, . . . , cttv are realized wi, . . . , wat times, respectively. 

We proceed to define a structure 2t' such that 2t' ^ A U {v?*}. Let = o^, 
and, for all fc (1 < fc < N'), let D'^ = {a E A \ st^[a] = Uk]- Evidently, the 
sets D[, . . . , D'j^, partition D'. On the other hand, consider the domain D of 
the structure D, and, for all fc (1 < fc < N'), let Dk = 6~^{ak)- These sets 
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are pairwise disjoint, and from the fact that w; is a solution of £'g, we have 
\Dk\ = \D'i.\, for all A; (1 ^ fc ^ TV'). By replacing 21 with a suitable isomorphic 
copy if necessary, we can assume that Dk — D'^ for all fc (1 ^ fc ^ A^'). We thus 
have: (i) D = D' Q A- (ii) st®[d] = 5{d) for all d G D; and (iii) = D. Now 
define the structure 21' interpreting E* over the domain A by setting: 



tp^'[a,6] 



tp® [a, 6] if a e and h e D 
tp®[a,6] otherwise. 



To ensure that no clashes can occur in these assignments, we must show that 
tp^[a] = tp®[a] for all a G D. But this follows from the success of Step [3] 
(specifically, from Condition [5a|) and the already-established fact that st'*[a] = 
6{a). By construction, then, 2) C 21'. Indeed, taking 21+ to be the expansion 
of 2t' obtained by interpreting the symbols of Sa \ S* in the same way as 
2)+, we immediately have 21+ \= A. To show that A U {ip*} is satisfiable, 
therefore, we require only to show that 21' \= (f* ■ Note first of all that the only 
2- types realized in 21' are 2-types realized either in 21 or in S). But ^ \= ip*, 
and D \= Wx\/y{a V x w y), whence 21' \= \lx\/y{a W x k, y). Therefore, it 
sufiices to show that, for all a G A, st'^ [a] = st'*[a], from which it follows that 
21' h Ni^h<m'^^^=CHV{h{x,y) Ax ^y). li a ^ D, then st^^'^ = st^'H is 
immediate from the construction of 2t'; so suppose a = d £ D. Let us write 

st'^[d]^Sid) - (7r,K,...,0) 
st«'[d] = (7r,K,..., 

Fix fc (1 ^ j ^ M), and suppose first that pj is not observable. Since D C o^, 
we have, by the construction of 21', tp'* [d,b] = pij if and only \i h ^ D and 
tp^[(i, 6] — it is then immediate that v'^ ~ Vj. Suppose, on the other 
hand, that pj is observable. Since C D, we have, by the construction of 2t' 
tp^ [d, b] = fj,j if and only if 6 G I? and tp® [d, b] = p,j ; but then the success of 
Step [3] (specifically. Condition [3b| then guarantees that v'j = Vj. Hence, for all 

o G A, st^ [a] = st^[a], as required. The corresponding argument for Procedure 
II is almost identical: we need only observe that, by requiring the numbers 
Wn'+i, ■ • ■ , wn to be in N, the constructed model 21+ will be finite. □ 

The matching lower bound to Theorem [1] is almost trivial. In fact, much 
smaller fragments than suffice for this purpose: recall that Q^^ is the frag- 
ment of QC^ in which no counting quantifiers and no instances of w occur. 

Theorem 2. There exists a Q^^ -sentence ip for which the problems and 
J- Sip coincide, and are NP-hard. 

Proof. By reduction of 3SAT. Let c and t be unary predicates and h, I2, I3, 
and s binary predicates. (Read c{x) as "x is a clause", li{x,y) as "y is the 
ith literal of x", t{x) as "x is a true literal", o{x,y) as "x and y are mutually 
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opposite literals", and s{x,y) as "x and y are the same literal".) Let (p be 

Wx{cix)^ V 3y{l,ix,y)Atiy)))A 

yxyy{o{x,y) {t{x) ^ -^t{y))) A Va;Vy(s(a;, -> {t{x) ^ t{y)))A 

/\ yxi3y{l,{x,y) Atiy)) ^yyil,ix,y) ^ tiy))). 

We reduce 3SAT to the problems Sip and J-'S,p, which we simultaneously show 
to be identical. Suppose a finite set F — {Ci, . . . , C„} of 3-literal clauses is 
given, where Ci = Li^i V Li^2 V 3. Let at (1 ^ i ^ n) and (1 ^ j ^ n; 
1 ^ i ^ 3) be pairwise distinct individual constants, and let Ar be the following 
set of ground, function-free literals: 

{c{ai) I 1 ^ i < n} U {lj{ai,bij) | 1 ^ i ^ n and 1 < j ^ 3}U 

{o{bij,bi'j>) I Li j and Liijr are opposite literalsjU 

{s{bij,bi'_ji) I Li.j and Li'j' are the same literal}. 

It is routine to check that: (i) if {ip} U Ar is satisfiable, then F is satisfiable; 
(ii) if F is satisfiable, then {ip} U Ar is finitely satisfiable. □ 

Since, as we remarked above, the (finite) query-answering problem is at least 
as hard as the (finite) wnsatisfiability problem. Theorem [2] also provides a lower 
bound for the complexity of (finite) query answering in gC'^ (matching The- 
orem |4] below). Specifically, let ip G QC^ be the sentence constructed in the 
proof of Theorem O and p a unary predicate; then the problems Q,p^3xp(x) and 
J-Qip3xp{x) coincide, and are co-NP-complete. We remark that lower complex- 
ity bounds of co-NP for query-answering problems are not always be obtained 
in this way (i.e. by reduction to the corresponding unsatisfiability problem), 
especially in inexpressive fragments. A good example is provided by the frag- 
ments considered in Calvanese et al. [3] (Theorem 8), who use instead a closely 
related result on 'instance checking' in description logics (Schaerf [T3], Theorem 
3.2). For similar results concerning an expressive logic, see Hustadt et al. [5], 
Theorems 20 and 26. 

We conclude this section by showing that there is no hope of extending 
Theorem [T] to a result concerning query answering: query- answering and finite 
query answering problems with respect to C^-formulas are in general undecid- 
able. (Again, much smaller fragments than suffice for this purpose.) We 
employ the standard apparatus of tiling systems. In this context, recall that 
a tiling system is a triple T ~ (C, H, V), where C is a non-empty, finite set of 
tiles and H, V are binary relations on C. For G N, let Nat denote the set 
{0, 1, . . . , A^ — 1}. An infinite tiling for T is a function / : — * C such that, for 
alH,j e N, (/(i,j),/(i + l,j)) e and (/(^, j), /(^, j + 1)) £ V. kn N-tiling iov 
T is a function / : ^ C such that, for aU i, j G Njv, {f{i,j), f{i + i,j)) £ H 
and {f{i,j),f{i,j + 1)) G (addition modulo N). The infinite tiling prob- 
lem on T is the following problem: given a sequence co, . . . , Cn of elements of 
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C (repeats allowed), determine whether there exists an infinite tiling / for T 
such that /(i,0) = Cj for alH (0 ^ i ^ n). The finite tiling problem on T is 
the following problem: given a sequence cq, . . . ,c„ of elements of C (repeats 
allowed), determine whether there exist an > n and an iV-tiling / for T such 
that f{i,0) = Ci for alH (0 ^ i ^ n). It is well-known that there exist tiling 
systems for which the infinite tiling problem is co-r.e.-complete, and that there 
exist tiling systems for which the finite tiling problem is r.e.-complete. 

Lemma 7. Let h and v be binary predicates, and let 7 be the formula 

yXi'\/X2^XsyX4{h{xi, X2) Av{xi,Xs) Av{X2,X4) — > h{xs,X4)). 

There exists a sentence (fi in such that the problem iS^A7 is co-r.e.-complete. 
There exists a sentence if in such that the problem J^S^/^^ is r.e.-complete. 

Proof. Let T = (C, H, V) be a tiling system whose infinite tiling problem is 
co-r.e.-complete. Treating the tiles c e C as unary predicates, let (po be the 
formula 

\/x3yh{x, y) A \/x3yv{x, y), 

let (fix be the formula 

Wx i \J c{x) j A /\ \fx{c{x) -c'(x))A 
\cec J cjtc' 

/\ yxyy{h{x, y) -^{c{x) A c{y)))A 

{c,c')gH 

/\ Vxyy{v{x, y) ^{c{x) A c{y))), 

and let (p he (po A (pT- Now, given a sequence c = cq, . . . ,c„ of elements of C 
(repeats allowed), let ao,. . . ,an be individual constants, and let Ac be the set 
of ground, function-free literals 

{co(ao), h{ao, ai), ci(ai), h{ai,a2), c„_i(a„_i), h{a„-i,a„), c„(a„)}. 

We claim that the instance c of the infinite tiling problem for T is positive if and 
only if A U {(/? A 7} is satisfiable. Thus, the problem S^^^ is co-r.e.-complete, 
proving the first statement of the lemma. 

To prove the claim, if / is an infinite tiling for T with /(«, 0) = for all i 
{0 ^ i ^ n), construct the model 21 as follows. Let A = N^; let af = (i, 0) for all 
iiO^t^n); let ^ (» + 1, j)) | g N}; let v'^ = {((*, j), ihJ + l)) | 

i,j G N}; and let = {{i,j) \ f{i,j) = c} for all c G C. It is routine to check 
that 2t ^ {ipAj}L)Ac. Conversely, suppose 21 ^ {(pA7}UAg. Define a function 
(7 : — > A as follows. First, set g{i,0) = af for alH (0 ^ i < n). Now, if i 
is the largest integer such that g{i,0) has been defined, select any b ^ A such 
that {g{i,0),b) e h'^ (possible, since 21 |= (po), and set g{i -|- 1,0) = b. This 
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defines g{i,0) for all « G N. Fixing any z, if j is the largest integer such that 
g{i,j) has been defined, select any b £ A such that {g{i,j),b) G (possible, 
since 21 |= ipo), and set g{i,j + I) — b. This defines g{i,j) for all i,j e N. 
Since 21 ^ Ag U {7}, we have, for all i,j G N, + l,j)} G and 

+ 1)) G v'^. We now define an infinite tiling / : — > C as follows. 
Since 21 \= (px, we set f{i,j) to be the unique c G C such that 21 ^ c[g{i,j)]. 
Finally, since 2t ^ Ag, we have f{i, 0) = Ci for alH (1 ^ i ^ n). 

The second statement of the lemma is proved analogously. □ 

Recall that we denote by the fragment of in which no counting 
quantifiers and no instances of « occur. 

Theorem 3. There exist an C^^ -sentence (p' and a positive conjunctive query 
tp{y) such that Qip'^^(^y) is undecidable. Similarly for J- Q^^i .^(^gy 

Proof. We deal with Q^i^^^y) only; the proof for TQ^i^^i^y) is analogous. Let 
the binary predicate h and the formulas 7 and be as in (the first statement 
of) Lemma [71 Let p be a new unary predicate and h a new binary predicate. 
Now let If' be the formula 

LP A \fxy(h{x, y) ^ -nh{x, y)), 

and "0 the positive conjunctive query 

3xiEla;2 3.i;3 3x43x(/i(a;i, X2) /\ v{xi, x^) A u(x2, X4) A h{x3, X4) A p{x)). 

It is obvious that, if A is any set of ground, non-functional literals (not involving 
the predicates p or h), then 

A U {ip'} 1= -0 iff A U {ip' A 7} 1= 3xp{x) 

iff A U {p' A 7} is unsatisfiable 
iff A U {(/J A 7} is unsatisfiable. 

It follows from Lemma [7] that Qip',^/, is undecidable. □ 

We remark that, at the cost of complicating the above proofs, the for- 
mula 7 in Lemma [7] could in fact have been replaced by the simpler formula 
Va;iVa;2Vx3(r(a;i, 2:2) A r{x2,X3) r^xi^x^)), asserting the transitivity of a bi- 
nary relation. Indeed, it is known that extending C — or even ^C^— with the 
ability to express transitivity of relations renders the satisfiability problem for 
this fragment undecidable. (Tendera [15] shows this in the case of four transi- 
tive relations; see also Gradel and Otto [51 for closely related results.) Notice in 
this context that the formula p' constructed in the proof of Theorem [3] is not 
in GC^ , since it contains the non-guarded conjunct \/xy{h{x, y) <-> -^h{x, y)). As 
we shall see in the next section, this is no accident: query-answering and finite 
query-answering are decidable with respect to sentences of QC^ and positive 
conjunctive queries. For an investigation of the data-complexity of satisfiabil- 
ity and query-answering in certain logics featuring both counting quantifiers 
and transitive predicates — and indeed of practical methods for solving these 
problems — see, for example, Hustadt et al. [6j, Glimm et al. [4], Ortiz al. [9]. 
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5 The fragment QC 



In this section, we establish some facts about QC^ which wiU subsequently be 
used to analyse the complexity of query-answering and finite query-answering 
within this fragment. To help motivate this analysis, we begin with an overview 
of our approach. 

Let If he a sentence of GC^, A a set of ground, function-free literals, and 
ifi{y) a positive conjunctive query. For simplicity, let us assume for the moment 
that the tuple y is empty — that is, ip is the Boolean query 

3xi... 3x„{pi{yi, zi) A ■ • ■ A psiys.Zs)), (3) 

where the yi and Zi are chosen from among the set of variables V = {xi, . . . , 
Formula ([3]) defines a graph (G,E) on this set in a natural way: (xi,Xj) G E 
just in case i ^ j and, for some fc (1 ^ /c ^ s), {xi,Xj} — {yk,Zk}. Again, 
for simplicity, let us assume for the moment that the resulting graph, (V, E), is 
connected. 

Now, there are two possibilities: either the graph (V, E) contains a loop (that 
is: it is 2-connected) or it does not (that is: it is a tree). If the latter, it can be 
shown (Lemma below) that ip is logically equivalent to some aC^-formula 
TT. But then the problem Qi^,^ is the complement of the problem ,SipA^ir, which 
is in NP by Theorem [1] Suppose, therefore, that {V, E) contains a loop, and 
consider any model 21 |= ■)/'• It is obvious that 21 contains a sequence of elements 
oq, . . . , cLt-i {t ^ s) such that for all i (1 ^ i < i), there is a binary predicate p 
with either 21 j= p[ai, a^+i] or 21 ^ p[ai+i, ai] (where the addition in the indices 
is modulo t). Let us call such a sequence a cycle. We therefore establish the 
following 'big-cycles' lemma for ^C^-formulas (p (Lemma [T51 below): if AU {ip} 
is (finitely) satisfiable, then, for arbitrarily large 17 G N, A U {1^9} has a (finite) 
model in which no cycles with t ^ n exist. It follows that A U {if} is (finitely) 
satisfiable if and only if AU is (finitely) satisfiable. That is, the problem 

Qtp,4i is the complement of the problem S^p, which, again, is in NP by Theorem[Tl 
similarly, mutatis mutandis, for finite satisfiability. 

For satisfiability (as opposed to finite satisfiability), this 'big-cycles' lemma 
is relatively straightforward, and close to the familiar fact that QC^ has the 'tree- 
model property' (see Kazakov [3, Theorem 1). For finite satisfiability, however, 
more work is required. We now proceed to lay the foundations for that work. 

Lemma 8. Let ip be a formula of QC^ , 21 a structure interpreting the signature 
of (p, and I a nonempty set. For i G I , let 2li be a copy of 21, with the domains 
Ai pairwise disjoint. If p> is satisfied in 21, then it is satisfied in the structure 
21' with domain A' — IJjgj Ai and interpretations — IJ^gj 9^' for every 
predicate q. 

Proof. If 9 : {x, y} ^ A is any variable assignment, and i G I, let 9i be the 
variable assignment which maps x and y to the corresponding elements in Ai C 
A' . A routine structural induction on ip shows that 2t ip if and only if, for 
some {— for all) i e I, ^' \=ei ^. □ 



19 



It follows immediately that, if a formula of QC^ has a finite model, then it 
has arbitrarily large finite models, and indeed infinite models. 

As with C^, so too with QC^, we can limit the nesting of quantifiers. 

Lemma 9. Let ip be a QC^ -formula. There exist (i) a quantifier-free QC^- 
formula a with x as its only variable, (ii) binary predicates ei,...,e/, and 
/i, . . . , /„! {different from «), (Hi) quantifier-free QC^ -formulas Pi, . . . {iv) 
positive integers Ci,...,Cm with the following property. If (p* is the QC^- 
formula 

VxaA f\ yx\/y{ek{x,y) ^ {Ph^ X ^ y))A 

l^h^l 

f\ yx3=c.y{Mx.y)^x^y). (4) 

and C = maxhCh, then (i) ip* |= tp, and (ii) any model of ip over a domain 
having at least C + 1 elements may be expanded to a model of tp* . 

Proof. Routine adaptation of standard techniques. See, e.g. Borger et al. [2], 
p. 378. □ 

In view of Lemma|9l we fix a signature E* of unary and binary predicates and 
a ^C^-sentence ip* over this signature, having the form For the remainder 
of Section [51 all structures will interpret the signature S*. We refer to the 
predicates /i , . . . , /„ in (j4|) as the counting predicates of S* ; and we understand 
the notions of message type, invertible message type, silent 2-type and vacuous 
2-type as in Definition [3l 

For the next definition, if tt is a 1-type we denote by irly/x] the set of formulas 
obtained by replacing all occurrences of a; in tt by y. (Recall that 1-types, on our 
definition, always involve the variable x: so, technically, irly/x] is not a 1-type.) 

Definition 14. Let tt and vr' be 1-types over S*. Denote by tt x tt' the vacuous 
2-type 

TT U Tr'[y/x] U {^q{x, y), -^q{y, x) \ q a, binary predicate of E*}. 



Lemma 10. Suppose %\= p* , and let 21 be the structure obtained by replacing 
every silent 2-type in 2t by the corresponding vacuous 2-type, that is: 



t^''[a,b] 

Then 21 h t^*- 



tp'^[a] X tp'^[6] i/tp'^[a, 6] is silent 
tp'^[a, 6] otherwise. 
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Proof. Since the 1-types of elements are the same in 21 and 2t, 2t |= \/xa. 
Since the only 2- types realized in 21 but not in 2t are vacuous, and since the 
guards in Ch are not satisfied by pairs of elements having vacuous 2-types, 
21 ^ Ai^/isj/ ^2;Vj/(e/i(a;, J/) {Ph V a; « j/)). Since all elements send the same 

messages in 21 and 21, 21 |= Ai^i^m '^x3=CiV{ft{x, y) f\x '^y). □ 

Lemma 11. Suppose that %\= tp* , and that B and B' are disjoint subsets of A 
such that \B\ ^ {mC)'^+mC+l, and \B'\^ mC + 1. Then there exist elements 
h & B and h' G B' such that tp^[6, b'] is silent. 

Proof. Pick any B'^ C B' such that \B'a\ ^ mC + 1. Now set 

Bo — {b ^ B \ ioi some b' ^ Bq, b' sends a message to &}. 

Since 21 ^ (^9*, no element of B'q sends a message to more than mC other 
elements, and since |i?ol = mC+l, |i?o| < mC{mC+l). But \B\ > mC{mC+l); 
so let 6 e B\Bo. Again, b can send a message to at most mC elements of Bq, yet 
|i3o I > mC; so let b' be an element of B'q to which b does not send a message. □ 

The ensuing analysis hinges on the special notion of a 't-cycle', which we 
now proceed to define. In the sequel, we employ the notions of path and cycle 
in a graph G in the usual way, where paths and cycles are not permitted to 
encounter nodes more than once (except of course that cycles loop back to their 
starting points). We take the length of a path wq, . . . , f; to be /, and the length 
of a cycle vq, . . . ,vi (where vi — vq) to be I. We insist that, by definition, all 
cycles have length at least 3. 

Definition 15. Let 21 be any structure interpreting E* over a domain A; let 
OCA; and let 

E = {(a, b) & A^ \ a ^ b and either tp^[a, b] is not vacuous 

or a and 6 are both in O}, 

so that G = {A, E) is a graph. By a t-cycle in (21, O), we mean a cycle in G 
containing at least one node lying outside O. A t-cycle in (21, O) is strong if, for 
any consecutive pair of elements a and b in that cycle, either a and b are both 
in O or tp^[a, 6] is an invertible message-type. 

To motivate these notions, think of O as the set of 'observable elements' 
of A — the elements that will interpret the constants in some set of ground, 
function-free literals A. By contrast, the elements oi A\0 are the 'theoretical' 
elements — elements whose existence may be perhaps forced by the background 
theory ip* . A t-cycle is thus a cycle in the graph G of Definition [15] which 
involves at least one theoretical element. 

Our first task is to show that, given any (finite) model 21 of tp* and any 
O C A, we can remove all 'short' strong t-cycles in (21, O). 
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Lemma 12. Suppose 2to ^ (^*; and let O C and > 0. We can find a 
model 58 t= (^* such that: (z) O C B; (ii) 2to|o — S|o/ fw^*^ ^^ere are no 
strong t-cycles in (5B,0) of length less than fl. Moreover, i/ 2lo is finite, then 
we can ensure that 05 is finite. 

Proof. Assume without loss of generality that ^ 4, let 

K = 2{\0\ + l)iimCf - l)/(mC - 1) + 2, 

and let 2li, . . . ,21/^ be isomorphic copies of 2to, with Ai n =0 for all i, j 
(0 ^ I < j ^ K) . Let 21 (with domain A) be the union of 2lo together with all 
of these copies. Formally: 

A = U 

= g^' for any predicate g. 

By Lemma [H 21 |= iy9*. (Here, we require that ip* is in QC^ , not just in C^.) 
Moreover, if any element of A sends a message of type /i in 21, then at least 
elements of A \ O do so. 

For a,b & A, let us say that h is directly accessible from a if either (i) a — h, 
(ii) tp^[a,5] is a message-type (not necessarily invertible), or (iii) a and h are 
both in O; further, let us say that 6 is accessible from a in I steps, if there exists 
a sequence of elements oo, . . . , a; of A such that ao = a, ai = b and, for all i 
(0 ^ i < I), Oi+i is directly accessible from at. If a G A, the number of elements 
accessible from a in I steps is certainly bounded by (|0| + 1) X)o^i^;("^^)*- 

Suppose then 

7 = ao,ai,a2 . . . , oq 

is a strong t-cycle in (21, O) of minimal length I < fl; and assume, without 
loss of generality, that ap ^ O. We modify 21 (without affecting 2l|o) so as to 
destroy this t-cycle, taking care only to create new strong t-cycles of greater 
length. Let a = oq and b — ai, and let fi be the invertible message- type such 
that tp^[a, b] = ^. 

Claim. There exist pairwise distinct elements c,d,e, f ^ A\0 such that 

1. tp^[c, d\ = /i; 

2. neither c nor d is accessible from either a or b in D, ~ 2 steps; 

3. tp2'[e] ^ tp2i[a], and i]y^[f] = tv^[b]; 

4. tp^[e, /] is silent; 

5. tp^[d,e] is not a message-type. 
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elements accessible from either 
a or & in f2 — 2 steps 



a- — 




— '0 


c- — 


fl fl 








f 


E 




F 



Figure 1: The configuration of the claim in the proof of Lemma [T^ An arrow on 
a line indicates a message-type; absence of an arrow on a line indicates a non- 
message type; a parenthetical arrow on a line indicates a 2-type which may or 
may not be a message-type. For definiteness, e and / have been drawn outside 
the set of elements accessible from a or 6 in — 2 steps; however, this is not 
required by the claim. 



Proof of Claim. Refer to Fig. [T] The number of elements oi A\0 accessible 
from either a or 6 in — 1 steps is bounded by 

2(|0| -f 1) (E('^^)') = 2(|0| + mmCf - l)/((mC) -1)<K. 

So choose c A\0 such that c sends a message of type /i, and c is not accessible 
from either a or 6 in — 1 steps; and choose d ^ A such that tp^[c, d] = /x. It 
follows that d is not accessible from a or 6 in il — 2 steps. Let E be the set of 
elements oi A\0 having the same 1-type as a, and F the set of elements of 
A\0 having the same 1-type as h. Now, E and F have cardinality at least 
where, since ri ^ 4, 

K ^ 2{{mCf - l)/(mC- 1) + 2 = 2{{mCf -f {mCf + mC + 2), 

Hence \E\{a,b,c,d}\ ^ 2mC{{mCf+mC+l)] and similarly, \F\{a,b,c,d)\ > 
2mC{{mCY + mC + 1). Therefore, we may select subsets Ei, . . . , EmC of E\ 
{a, b, c, d} and subsets F{, . . . , F/^^ of F \ {a, b, c, d}, each containing at least 
{mC)^ + mC + 1 elements, and with these 2mC sets pairwise disjoint. Applying 
Lemma [TT] to Ei and Fi for alH (1 ^ i ^ cM), select £ Ei and fi G Fi such 
that tp'^[ei, fi] is silent. But d cannot send a message to more than mC — 1 of 
the Ci (since it already sends a message to c), so we may pick e to be some Ci 
such that tp^[(i, e^] is not a message- type, and / to be the corresponding /j. The 
elements c, d, e and / then have all the properties required by the claim. □ 

Having obtained c, d, e, f, and returning to the proof of the lemma, we modify 
2t so as to ensure that the 2-type connecting a and d is silent. (Note that tp^[a, d] 
is certainly not a message- type, but tp^[(i, a] might be.) More precisely, we 
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Figure 2: Ensuring that tp^ [a, d] is silent. Types displayed in the drawing of 
21' are to be read left-to- right: thus, tp^ [a, d] = tp®[e, /], tp® [e, d] = tp^[a, d], 
and tp^ [e, /] = tp'^[e, d]. Lines and arrows are interpreted as in Fig. [1] 

define the structure 21' over A to be exactly like 2t except that 





[a,d] 


= tp^ 


hf] 




[e,d] 


= tp^ 


[a,d] 




[ej] 


= tp^ 


[e,d]. 



The transformation of 21 into 21' is depicted in Fig. [2] The elements a, c 
and e all have the same 1-type in 2t; similarly for b, d and /. Therefore, 
these type-assignments are legitimate, and do not affect the 1-types of any 
elements, whence 21' |= Vxa. Since no new 2-types are introduced, 21' \= 
/\i^h^i^^^y(^h{x,y) {Ph W X ~ y)). By inspection of Fig. [21 every element 
sends the same messages in 21' as in 21 (though to different elements), whence 
21' h Ai^.^™Vx3^c.y(/^(a;,y) Ax 9^ y). Thus, 21' h V* ■ Since a, e ^ O, 
2l'|o = 2t|o; and by construction, tp^ [a, d] is silent. Note also that 2t and 
21' never differ with respect to any invertible message- types: in particular, the 
strong t-cycles in (2t, O) are exactly the strong t-cycles in (21', O). 

We are now ready to destroy the strong t-cycle 7 in (21', O). Let 21" be 
exactly like 2t', except that 

tp*^" [a, b] ^ip^' [a, d\ ip^" [a, d\ =tp^' [a, b] 

tp^" [c, b] ^ip^' [c, d] ip^" [c, d] ^tp^^' [c, b] . 

The transformation of 21' into 2t" is depicted in Fig. [31 Again, these assignments 
are legitimate, with 1-types unaffected; no new 2-types are introduced; and 
every element of A sends the same messages in 21" as it does in 2t' (though to 
different elements). Thus 21" h V* ■ Since a, c ^ O, 2t"|o = 2t'|o = 2l|o; and by 
construction, 7 is not a strong t-cycle in (2t", O). Moreover, we claim that any 
sequence 7' which is a strong t-cycle in (21", O), but not in (21', O), is longer 
than 7. To show this, we suppose ^ I7I < fi, and derive a contradiction. 
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21' 21' 



Figure 3: Destroying a strong t-cycle: the two-types in 21" are to be read from 
left to right; thus, tp^"[a,6] = tp^'[a,d], tp'^"[a,d] = tp*^'[a,6], tp°'"[c,6] = 
tp*^ [c, d] and tp^ [c, d] = tp® [c, 6] . Lines and arrows are interpreted as in 

Fig.m 



Since 7' is not a strong t-cycle in (21', O), at least one of the pairs (a, d), (d, a), 
(6, c) or (c, 6) is consecutive in 7'; so suppose, without loss of generality, that 
(a, d) is. Indeed, by starting the cycle 7' at d, we may write 

j' — d, . . . ,a,d. 

Now 6 certainly occurs in 7'. For otherwise, all consecutive pairs of 7' except 
(a, d) send each other messages in 21', contradicting the fact that d is not ac- 
cessible from a m n — 2 steps. In fact, an exactly similar argument shows that 
(c, b) occurs as a consecutive pair in 7', since d is not accessible from b in CI — 2 
steps either. Thus, we may write: 

7' = d, ci, . . . ,Cs,c,b,bi, . . .,bt,a,d, 

(s, t ^ 0). Returning to the structure 21', then, we see that 

71 = d, ci, . . . ,Cs,c, d 

72 = 6,61, . . . ,6t,a, 6 

are strong t-cycles in (21', O); and so, by the minimality of 7 in 2t', we have 
s -h 2 ^ I7I and t 2 ^ |7|. It follows that |7'| = s -h i -f 4 > 2|7| > I7I, a 
contradiction. 

Thus, in transforming 21 into 21", we destroy one strong t-cycle of length less 
than r2, and create only longer strong t-cycles. Proceeding in this way, then, we 
eventually destroy all strong t-cycles of length less than Vl. □ 

Our next task is to show that, given any (finite) model 21 of ip* and any 
O C A, we can remove all 'short' t-cycles in (21, O), strong or otherwise. 

Lemma 13. Suppose 2lo |= 9?*; and let O C- Aq and i7 > 0. We can find a 
model 05 1= (^* such that: (i) O C B; (ii) 2lo|o = ^lo/ and {in) there are no 
t-cycles in (*B,0) of length less than U,. Moreover, i/2to is finite, then we can 
ensure that 05 is finite. 
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'^si ,si,...,si "^Sl ,. . . '**S V ,si ,. . .SI "^SY t--tSy 

Figure 4: Organization of 2t* as a tree of copies of 2to, in the case where F = |5 
is finite; for legibihty, the elements of S are numbered, arbitrarily, as si, . . . sy- 

Proof. By Lemma [121 let 2t be a finite or countable model of (p* , with 21 finite 
if 2lo is, such that: (i) OCA; (ii) 2lo|o = 2l|o; and (iii) there are no strong 
t-cycles in (21, 0) of length less than 51. Let 

S ~ {{a,b) <^ A? I a ^ 5 and tp^[a, 6] is a non-invertible message- type}, 

and let Y — \S\. Obviously, if 2t is finite, then so is Y . In addition, let S"*^ 
be the set of sequences of elements of S of length ^ fi. We denote the length 
of fj S S*^ by |(t|; we write empty sequence as e and the concatenation of 
sequences a and r as trr; as usual, we identify sequences of length 1 with the 
corresponding elements of S . 

Let 2le = 21. For a G S*^ \ {e}, let 2lcr be a new copy of 21, with domain A„\ 
and for any a G A, denote by the corresponding element of A^. We assume 
that the A„ {a G S*^) are pairwise disjoint. Now let 2t* be given by: 

A* = [j A, 

(j^ = for any predicate g. 

Note that O C ^4 C A* . We may picture 21* as a tree of copies of 2t, with 2te = 21 
at the root, and having branching factor Y. We notionally divide the tree into 
tiers, taking the root to be the first tier, and the leaves to be the (SI + l)th tier. 
The case where Y is finite is illustrated in Fig. IH the case where y = may 
be pictured analogously. By Lemma [51 21* \= (p* ■ (Here, we require that ip* is in 
QC^, not just in C^.) Moreover, there are no strong t-cycles in (21*, O) of length 
less than 57. 

We modify 21* as follows to obtain a structure 05 over the domain B = A* . 
As a first (easy) step, if a and h are any distinct elements of A*, not both 
in O, such that tp^[a,5] is silent but not vacuous, we can apply Lemma IIOI 
and replace tp'*[a, 6] with the vacuous 2-type tp'*[a] x tp'^[&]. (Notice that this 
transformation does not affect 2l*|o.) Hence, we may assume that, if (a, 5) is 
a consecutive pair in some t-cycle in (2t*,0), with a, 6 not both in O, then at 
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least one of tp'^[a,6] and tp'^[6, a] is a message-type. Furthermore, since there 
are no strong t-cycles in (2t*, O) of length less than il, any t-cycle in (2t*, O) of 
length less than r2 contains at least one consecutive pair (a, 5), such that: (i) 
a and b are not both in O, and (ii) exactly one of tp'*[a,6] and tp®[&, a] is a 
message- type (and hence a non-invertible message- type) . 

We obtain 5B from 21* by re-directing non-invertible messages in successive 
tiers of the tree in Fig. |4]as follows. First, we consider the structure 2le = 21 at 
the root of the tree. Let a, b be any distinct elements of A, not both in O. If 
tp^[a, 6] is a non-invertible message-type fi, then we divert the message which 
a sends to b in 2t* so that it instead points to the element corresponding to b 
in the structure at the (a, b)th position in the second tier of the tree in Fig. 21 
Formally, we set 

tp'»[a,6] = tp*^* [a] X tp^' [5] 
tp®[a,6(a^f,)] ^ tp'^' [a,b]. 

Otherwise, we leave the elements of 2te unaffected. This transformation is de- 
picted in Fig. [51 

Next, we consider the copies of 21 in tiers 2 to il: i.e. those 2lcr such that 
1 ^ |(t| < il. Let a, b be any distinct elements of A. If tp^[a, b] is a non-invertible 
message-type /z, then we divert the message which sends to b„ in 21* so that 
it instead points to the element corresponding to b in the copy of 21 located at 
the (a, 6)th daughter of 2lcr. Formally, we set 

tp'^[a„,b,] = tp^'* K] X tp=^* [6,] 
tp'^[aa,b„(^a,b)] = tp*^ [ao-,6cr]. 

Otherwise, we leave the elements of 21^ unaffected. 

Finally, we consider the copies of 21 in the bottom tier: i.e. those 2lcr such 
that \a\ = n. Let a,b be any distinct elements of A. If tp^[a, 6] is a non- 
invertible message-type /z, then we divert the message which sends to ba- 
in 21* so that it instead loops back to the element corresponding to b in the 
structure located at the (a, 5)th node of the second tier of the tree. Formally, 
we set 

tp'^ [aa,ba] = tp^i* [a,] X tp»* [6,] 

tp'^itta, b(^a,b}] = tp^ [aa,ba]. 

Otherwise, we leave the elements of 2lcr unaffected. 

It is obvious that these assignments are legitimate, leave 1-types unaffected, 
introduce no new 2- types, and leave the number of messages of each type sent 
by any element unaffected. Hence, 05 |= (^*. It is equally obvious that 58|o = 
2l*|o = 2lo|o, and that there are no t-cycles in (25, O) of length less than il. □ 

We remark that the method of removing short t-cycles used in Lemma [T31 
works only for cycles featuring non-invertible message types. In particular, 
the large 'fan-in' at elements of structures in the second tier requires that the 
message-types being redirected are non-invertible. 
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Figure 5: Re-direction 
Lemma 1131 



of non-invertible messages in 21^ in the proof of 



6 Data-complexity of query-answering and 
finite query-answering 

In this section, we prove that the query-answering and finite query- answering 
problems with respect to a positive conjunctive query ip{y) and a formula ip of 
GC^ are in the class co-NP. Lemma [T51 plays a key role in this proof, by allowing 
us to re-write positive conjunctive queries as disjunctions of queries involving 
only two variables (at which point we can apply Theorem [T]). The remainder of 
the proof is largely a matter of book-keeping. 

We begin with a generalization of the observation that \/xiy9{x, y) is logically 
equivalent to yx0{x,x) A \/xiy{x 9^ ?/ — > 0{x,y)). We employ the following 
notation. Fix some set of constants K and tuple of variables x = xi, . . . , Xn- 
Let S be the set of all functions ^ : x ^ xiJ K . For each <^ G S, denote by 
X(_ the (possibly empty) tuple of variables ^(xi), . . . ,^(x„) with all constants 
and duplicates removed. Further, for any formula 0, denote by 9^ the result of 
simultaneously substituting the terms £,{xi), . . . , C(a;„) for all free occurrences 
of the respective variables Xi, . . . , x„ in 6*. 

Lemma 14. Let x be a tuple of variables, K a finite set of constants, and 5 
the set of all functions ^ : a; — > a; U K . If 6 is any formula, then \/x6 is logically 
equivalent to 

/\yx^{{ /\ x^cA /\ x^x')^0^). (5) 

In fact, let Si and'^2 be disjoint {possibly empty) subsets ofE. such that S1US2 = 
5. Then Vi0 is logically equivalent to 



/\ x!^x' ^ %)) > A i /\ Wx^e^ 



x,x' Gx^ 
x^x 



(6) 
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Proof. Denote by (pi the formula (O, and by (p2 the formula (O. It is obvious 
that \= VxB —^(p2, \= ^ y^i, and \= ipi ^ \/x9. □ 

The next lemma allows us to remove individual constants from universally 
quantified formulas at the expense of adding some ground literals. 

Lemma 15. Let ip be a formula, Q a set of formulas, and c an individual 
constant. Let p be a new unary predicate and z a new variable {'new' means 
'not occurring in ip or 0'). Denote by ip' the result of replacing all occurrences 
of c in p by z, and let be the formula \/z{p'\/^p{z)). Then the sets of formulas 
G U {p} and Q U {pc{c), tp} are satisfiable over the same domains. 

Proof. Obviously, {pc{c), ^} |= p. On the other hand, if 21 ^ O U {p}, expand 
21 to a structure 21' by setting — {c^}. □ 

Recall that a clause is a disjunction of literals (with the empty clause, _L, 
allowed), and that a clause is negative if all its literals are negative. In the 
sequel, we continue to confine attention to signatures involving only unary and 
binary predicates together with individual constants. 

Definition 16. Let 77 be a clause, let T be the set of terms (variables or con- 
stants) occurring in 77, and let 

i? = {(ti, t2) G \ ti ^ t2 and either ti,t2 both occur in some literal of r] 

or ti and ^2 are both constants}. 

Denote the graph (T, E) by Grj. (We allow the empty graph for the case rj = L.) 
We say rj is v-cyclic if contains a cycle (in the usual graph-theoretic sense) 
at least one of whose nodes is a variable; otherwise, we say rj is v-acyclic. 

Definition 17. Let K be a set of individual constants. A v-formula {with 
respect to K) is a sentence of the form 

\/x{{/\x^c/\ /\ x^x')^r]), (7) 



where 77 is a v-cyclic negative clause. 

The intuition behind v-formulas is that they provide a counterpart to the 
notion of a t-cycle in a pair (21, O), given in Definition [T5l Specifically: 

Remark 3. Let ^ be a structure, K the set of individual constants interpreted 
by 21, and O = {c® | c e K}. Suppose that distinct individual constants in K 
have distinct interpretations in 21. Let v be a v-formula with respect to K . If 
21 ^ then there is a t-cycle in (21, O) of length at most \\v\\ . 

Definition 18. Let 77 be a clause. We call 77 splittable if, by re-ordering its 
literals, it can be written as 771 V 772, where Vars(77i) n Vars(772) = 0; otherwise, 
77 is unsplittable. 
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Remark 4. Let rj be a non-ground clause. If 77 is unsplittable and v-acyclic, 
then it contains at most one individual constant. 

Lemma 16. Let ri[x, x) be a negative clause with no individual constants, in- 
volving exactly the variables x,x. Suppose further that r]{x,x) is non-empty, 
unsplittable and v-acyclic. Then there exists a QC^ -formula of ili{x) such that 
^xr]{x,x) and 'ip{x) are logically equivalent. 

Proof. We proceed by induction on the number of variables involved. If x is 
the empty tuple, there is nothing to prove, so suppose otherwise. Since 77 is 
unsplittable and v-acyclic, and contains the variable x, Gn may be viewed as 
a tree with x at the root. Let xi, . . . ,Xn be the immediate descendants of x 
in the tree G,,. Further, for all i (1 ^ i ^ n), let Xi be a (possibly empty) 
tuple consisting of those variables in x which are proper descendants of Xi in 
G^. Then \fxri{x, x) is logically equivalent to some formula 

S{x)\/ \J \/xi{ei{x,Xi) y\/xirii{xi,Xi)), 

where d{x) is a negative clause involving exactly the variables {x}, and, for 
all J (1 ^ J ^ n): (i) ei{x,Xi) is a non-empty negative clause each of whose 
literals involves the variables {x, x^}, and (ii) r}i{xi, Xi) is a negative clause which 
involves exactly the variables {xi} U Xi. By inductive hypothesis, there exists a 
5C^-formula tpi{xi) logically equivalent to \lxirji{xi,Xi). But then \lxrj{x,x) is 
logically equivalent to 

5{x)y y 'iy{tdx,y)\JMy)), 

which in turn is trivially logically equivalent to a ^/C^-formula. □ 

Lemma 17. Let ip be a QC^ -formula, A a finite set of ground, function-free 
literals, and T a finite set of v-formulas. Suppose that A contains the literal c 76 
d for all distinct individual constants c, d occurring in A U T. Then A U {(/s} U T 
is [finitely) satisfiable if and only i/ A U {(/s} is {finitely) satisfiable. 

Proof. The only-if direction is trivial. So suppose 2tJ is a (finite) model of 
{1^} U A, with domain Aq. Let O C Aq he the set of elements interpreting 
the individual constants in A U T, and let 2lo be the reduct of 21 J obtained by 
ignoring the interpretations of those individual constants. 

Let ip* and C be obtained from (p as in Lemma ID Let 2li, . . . ,2lc be iso- 
morphic copies of 2lo with the domains (0 ^ i ^ C) pairwise disjoint; and 
let 21 be the union of these models as in Lemma |S1 Thus, OCAoCA,'i^\=if, 
and \A\ > C. By Lemma O let 21' be an expansion of 21 such that 21' \= (p* . 
Obviously, 2t' is finite if 2l(j" is. 

Let J7 > ||w|| for all w e T. Applying Lemma [T3l to 21', let *B be a model 
of (p* (and hence of ip), finite if 21' is finite , such that: (i) O C B; (ii) Q3|o = 
21' I o — 2I0I0; and (iii) there are no t-cycles in (*B, O) of length less than fl. Let 
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05"'" be the expansion of 03 obtained by interpreting any constants as in 21 J. 
Thus, *B+ 1= A U {(/?}. If *B+ fails to satisfy some formula in T of the form ([7]), 
then, by Remark [31 there is a t-cycle in (S,0) of length less than il, which is 
impossible. Hence 05+ ^ T, as required. □ 

Theorem 4. For any QC^ -sentence ip and any positive conjunctive query ipiy), 
both Q^^^(y) and TQ^^^^y) are in co-NP. 

Proof. We give the proof for J-Q^p^^[yy, the proof for Q^^q{jj) is analogous. 

Let an instance (A, a) of ^Qi^,^(y) be given, where A is a set of ground, 
function- free literals, and a a tuple of individual constants. By re- naming in- 
dividual constants if necessary, we may assume that the constants a all have 
codes of fixed length, so that a may be regarded as a constant. Let n = ||A||, 
then. The instance (A, a) is positive if and only if ^->{a) is true in every finite 
model of A U {^p}- Hence, it suffices to give a non-deterministic procedure for 
determining the finite satisfiability of the formula 

/\AA(^A-V(a), (8) 

running in time bounded by a polynomial function of n. 

We may assume without loss of generality that all predicates in A occur in Lp 
or since — provided A contains no direct contradictions — literals involving 

foreign predicates can simply be removed. Further, we may assume that, for 
every ground atom a over the relevant signature, A contains either a or -la. For 
if not, non-deterministically add either of these literals to A; since all predicates 
of Lp and "(/"(y) by hypothesis of arity 1 or 2, this process may be carried out 
in time bounded by a quadratic function of n. Finally, we may assume that, for 
all distinct c, d G const(A) U a, A contains the literal since, if A contains 

CK, d, either of these constants can be eliminated. 

Since ^{y) is a positive conjunctive query, we may take ~'ip{a) to be Vxry, 
where ?7 is a negative clause. Let K — const(A) U a, and let S be the set of 
functions from x to xU K. Thus, |S| ^ {n + li + hY'- , where li is the arity of 
X and I2 is the arity of y. Employing the notation of Lemma 1141 and recalling 
Definition [TBI let 

51 = {C e S I is v-cyclic} 

52 = e S I is V- acyclic}. 

Thus, Formula ([8|) is logically equivalent to 

f\AAipA f\yx^{{ /\x^cA /\ X ^ x') ^ 7]^) A /\ yx^T)^; (9) 

moreover, this latter formula can be computed in time bounded by a polynomial 
function of and hence of n. Let us write ^ as 

/\AA^a/\TA /\ Vx^r?^; (10) 
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where T is a finite set of v-formulas with respect to K. Let rj^ denote T if any 
ground literal of 77^ appears in A; otherwise, let rj^ be the result of deleting 
from 77^ all ground literals whose negation appears in A. (If no literals remain, 
r]'^ is taken to be ±.) Thus, (|10p is logically equivalent to 

/\AA¥.a/\TA /\ Vx^77f. (11) 

Since A contains every ground literal or its negation over the relevant signature, 
no ground literal can appear in any of the rj^. Moreover, if any of the ry^ is 
empty, (jlip is trivially unsatisfiable; so we may suppose otherwise. List the 
formulas \/x^ri^ for ^ S S2, as \lxirji (1 ^ z ^ s); and re-write each ^XiTji as a 
disjunction 

VXi,i7?i,i V • • • V 'iXi^tiViM 

where the 77^^ are unsplittable. For each i (1 ^ i ^ s), pick a value j (1 =^ J ^ ij) 
and write Wxi^ijij as \lx[ri[. Thus, (jlip is finitely satisfiable if and only if, for 
some way of making the above choices, the resulting formula 

/\AA^a/\TA f\ ^-x\tI^ (12) 

is finitely satisfiable. This (non-deterministic) step may again be executed in 
time bounded by a polynomial function of n. Note that each 77^ is v-acyclic, 
unsplittable and non-ground; hence, by Remark |4l it contains at most one indi- 
vidual constant. We may assume for simplicity, and without loss of generality, 
that r][ contains exactly one individual constant — say, Ci. 

Let r(l be the result of replacing all occurrences of Ci in 77^ by x (where x 
does not occur in 77^), and let pi be a new unary predicate depending only on the 
clause 77" (and not on i): that is, if rj'/ = 77", then pi = pj. Since 77^ contains at 
most one individual constant, 77^ is a clause in the signature of ipiy)', therefore, 
the number of distinct predicates pi is bounded by some constant, independent 
of A. Let A' = {pi{ci) I 1 ^ i ^ s}. By Lemma fTSl then, ([T2|) is satisfiable over 
the same domains as 

/\(AuA')A^a/\T /\ VxxU?7rv-p.(x)). (13) 

Evidently, (|13p can be computed in time bounded by a polynomial function 
of n; in particular, |A'| is also bounded in this way. However, the number 
of formulas \/xx^{t]'/ V -'Pi{x)) occurring in — assuming duplicates to be 
omitted — is bounded by a constant. By Lemma [TBI there exists, for each such 
'ixx[(rj'l V ^pi{x)), a logically equivalent ^/C^-formula \lx9i{x). Let 9 be the 
conjunction of all these Mx9i{x). Then (fTS]! is logically equivalent to 

/\(AU A') A (¥>A6') AT. (14) 

Finally, by Lemma [iTl (fT4|) is finitely satisfiable if and only if 

/\(AU A') A (¥>A6') (15) 
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is finitely satisfiable. Since {f AO) is a one of a finite number H of possible QC^- 
(and hence C^-) formulas, where H depends only on the signature of ^Piv), and 
not on A, the finite satisfiability of can be tested nondeterministically in 
time bounded by a polynomial function of n, by Theorem [TJ □ 

That the same complexity bounds are obtained for the query-answering and 
finite query- answering problems in Theorem [4] is, incidentally, not something 
that should be taken for granted. For example, Rosati 13J presents a relatively 
simple logic (not a subset of C^) for which query- answering is always decidable, 
but finite query-answering in general undecidable. 
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